Group 18: R for Bio Data Science final project

Kacper
Maciejewski
(s243548)
Sofia
Russo
(s252054)
Aleksandra
Wozniak
(s253713)
Gabriel
Loayza
(s252608)
Dagmar
Geevers
(s252256)

Introduction

Epigenetic modifiers HDAC

  • HDAC gene family: epigenetic modifiers that remove histone acetyl groups and repress gene expression

  • Cancer stemness: HDACs are often overexpressed in tumors and help maintain cancer stem cells and therapy resistance

  • Immunogenicity: HDAC activity can reduce antigen presentation and promote an immunosuppressive tumor microenvironment

HDAC inhibitors

HDAC inhibitors can decrease cancer stemness and increase tumor immunogenicity, improving responses to immunotherapy (Gupta 2025).

Introduction

Project aim

what

Is HDAC gene family expression related to cancer stemness, clinical outcome, and survival?

how

By recreating the visualizations and analyses provided in the following study (Figures 2-4), using the same data and our own R implementations.

The Epigenetic Modifiers HDAC2 and HDAC7 Inversely Associate with Cancer Stemness and Immunity in Solid Tumors, IJMS, (Maciejewski et al. 2024)

why

To confirm the correctness of conclusions in the given paper as there is little-to-none literature confirming their findings so far.

Methods

Analysis - project overview

flowchart TD
  A[cBioPortal] --> B[TCGA clinical data]
  A --> C[TCGA expression data]

  %% Malta et al. before mRNAsi
  X[Malta et al. study] --> D[mRNAsi stemness index]

  %% Data cleanup after B, C, and D
  B --> Bclean[Data cleanup]
  C --> Cclean[Data cleanup]
  D --> Dclean[Data cleanup]

  %% Data merge (data join)
  Bclean --> DM[Data merge]
  Cclean --> DM
  Dclean --> DM

  %% After data merge: data augmentation
  DM --> E[Data augmentation]

  E --> F[Descriptive summary]

  E --> G[Correlation analysis]
  G --> H[mRNAsi]
  G --> I[stage/grade]

  E --> J[Survival modeling]
  J --> K[Overall Survival]
  J --> L[Disease Free Survival]
  
  F --> M[Plots & tables]
  H --> M
  I --> M
  K --> M
  L --> M

  %% Styling: white background, black text
  style A fill:#ffffff,stroke:#000000,color:#000000
  style B fill:#ffffff,stroke:#000000,color:#000000
  style C fill:#ffffff,stroke:#000000,color:#000000
  style X fill:#ffffff,stroke:#000000,color:#000000
  style D fill:#ffffff,stroke:#000000,color:#000000
  style Bclean fill:#ffffff,stroke:#000000,color:#000000
  style Cclean fill:#ffffff,stroke:#000000,color:#000000
  style Dclean fill:#ffffff,stroke:#000000,color:#000000
  style DM fill:#ffffff,stroke:#000000,color:#000000
  style E fill:#ffffff,stroke:#000000,color:#000000
  style F fill:#ffffff,stroke:#000000,color:#000000
  style G fill:#ffffff,stroke:#000000,color:#000000
  style H fill:#ffffff,stroke:#000000,color:#000000
  style I fill:#ffffff,stroke:#000000,color:#000000
  style J fill:#ffffff,stroke:#000000,color:#000000
  style K fill:#ffffff,stroke:#000000,color:#000000
  style L fill:#ffffff,stroke:#000000,color:#000000
  style M fill:#ffffff,stroke:#000000,color:#000000

  • TCGA expression samples were deduplicated by averaging.
  • Spearman correlations with P-value correction for multiple testing (BH method)
  • Cox-based risk prediction model with high-low expression (median split) of each HDAC family members.
  • On all heatmaps, columns and rows are ordered as original hierarchical clusters.

Methods

Data

Stemness index (mRNAsi)

downloaded from supplementary materials (Table S1) (Malta et al. 2018)

Expression and clinical data

solid tumors selection; data fetched via cBioPortal R API (Cerami et al. 2012)

Results

Data distributions

HDAC expression across tumors

Tumor grade across whole dataset

Cancer stemness and grade

Data distributions are as expected with main HDAC genes expression medians on similar levels. Grade and stage (not shown) statuses represent normal-like distributions.

Results

HDAC expression vs grade

HDAC expression vs stage

flowchart TD
  A[pivot_longer] --> B[group_by<br/>by 'cancer' and 'gene']
  B --> C[Spearman]
  C --> D[pivot_wider]
  D --> E[Benjamini Hochberg correction]
  E --> F[annotations and heatmap]
  
classDef whiteNode fill:#ffffff,stroke:#000000,color:#000000;
class A,B,C,D,E,F whiteNode;

The expression of HDAC family members is sssociated with clinicopathological features of TCGA solid tumors in a cancer-dependent and gene-specific manner.

Results

HDAC family expression is associted with survival

flowchart TD
  A[standardize expression<br/>& convert status to numerical] --> B[split by cancer]
  B --> D[Cox model<br/>HR & p-values]
  D --> E[create matrices<br/>& reshape to long format]
  E --> G[heatmap and annotation]
  
classDef whiteNode fill:#ffffff,stroke:#000000,color:#000000;
class A,B,C,D,E,F,G,H whiteNode;

HDAC gene expression does not show consistent behaviour within HDAC classes. The prognostic relevance is highly tumor-dependent but still present.

Results

HDAC family expression vs cancer stemness

  • Class I HDAC family members correlate positively, while class IIA HDAC genes correlate negatively with cancer stemness across TCGA solid tumors.
  • Specifically, HDAC2 (member of class I) and HDAC7 (member of class IIA) genes show the highest correlations across studied solid tumors, exhibiting statistically significant opposite trends

Discussion

Inverse association with cancer stemness and immunogenicity

  • This project confirmed a strong, tumor-type–independent association between cancer stemness and elevated class I HDAC expression, especially HDAC2, while class IIA HDACs showed the opposite trend, with HDAC7 most strongly negatively associated.
  • Since cancer stemness and immune infiltration are inversely related (Malta et al. 2018), patients with stem-like, poorly infiltrated tumors and high HDAC2 expression may benefit from combining HDAC2-selective inhibitors with immunotherapy.

Replication crisis

Although the so-called replication crisis in biomedical research is real (Ioannidis 2005), we reproduced the key findings with only minor differences. This is a major success, given the lack of published code, the need to infer minor preprocessing steps, and the fact that we recomputed survival from raw data rather than using existing database results.

Discussion

Technical take aways

  • We successful managed to recreate expected figures and managed to derive exactly the same results and conclusions.
  • In our future project we should use try(..., silent=TRUE) with higher caution as it made it hard for us to debug.

Further steps and improvements

  • Confirming the relation of HDAC and immunological inflamation which was included in the reproduced paper but outside of the scope of this project.
  • Plotting function could have been shared across all figures to deliver the same aesthetics.
  • pheatmap or complexheatmap would create better-looking plots with recomputed clusterings but we wanted to be 100% aligned with the course content.

References

Cerami, Ethan, Jianjiong Gao, Ugur Dogrusoz, Benjamin E. Gross, Serdar O. Sumer, Bülent A. Aksoy, Anders Jacobsen, et al. 2012. “The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data.” Cancer Discovery 2 (5): 401–4. https://doi.org/10.1158/2159-8290.CD-12-0095.
Gupta, Preeti. 2025. “Epigenetic Alterations in Cancer: The Therapeutic Potential of Epigenetic Drugs in Cancer Therapy.” Drugs and Drug Candidates 4 (2): 15. https://doi.org/10.3390/ddc4020015.
Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLOS Medicine 2 (8): e124. https://doi.org/10.1371/journal.pmed.0020124.
Maciejewski, Kacper, Marek Giers, Urszula Oleksiewicz, and Patrycja Czerwinska. 2024. “The Epigenetic Modifiers HDAC2 and HDAC7 Inversely Associate with Cancer Stemness and Immunity in Solid Tumors.” International Journal of Molecular Sciences 25 (14): 7841. https://doi.org/10.3390/ijms25147841.
Malta, T. M., A. Sokolov, A. J. Gentles, T. Burzykowski, L. Poisson, J. N. Weinstein, et al. 2018. “Machine Learning Identifies Stemness Features That Correlate with Oncogenic Dedifferentiation.” Cell 173 (2): 338–354.e15. https://doi.org/10.1016/j.cell.2018.03.034.